Provenance Information in a Collaborative Knowledge Graph: An Evaluation of Wikidata External References

نویسندگان

  • Alessandro Piscopo
  • Lucie-Aimée Kaffee
  • Chris Phethean
  • Elena Paslaru Bontas Simperl
چکیده

Wikidata is a collaboratively-edited knowledge graph; it expresses knowledge in the form of subject-property-value triples, which can be enhanced with references to add provenance information. Understanding the quality of Wikidata is key to its widespread adoption as a knowledge resource. We analyse one aspect of Wikidata quality, provenance, in terms of relevance and authoritativeness of its external references. We follow a two-staged approach. First, we perform a crowdsourced evaluation of references. Second, we use the judgements collected in the first stage to train a machine learning model to predict reference quality on a large-scale. The features chosen for the models were related to reference editing and the semantics of the triples they referred to. 61% of the references evaluated were relevant and authoritative. Bad references were often links that changed and either stopped working or pointed to other pages. The machine learning models outperformed the baseline and were able to accurately predict non-relevant and nonauthoritative references. Further work should focus on implementing our approach in Wikidata to help editors find bad references.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Opportunities and Challenges Presented by Wikidata in the Context of Biocuration

Wikidata is a world readable and writable knowledge base maintained by the Wikimedia Foundation. It offers the opportunity to collaboratively construct a fully open access knowledge graph spanning biology, medicine, and all other domains of knowledge. To meet this potential, social and technical challenges must be overcome many of which are familiar to the biocuration community. These include c...

متن کامل

Wembedder: Wikidata entity embedding web service

I present a web service for querying an embedding of entities in the Wikidata knowledge graph. The embedding is trained on the Wikidata dump using Gensim’s Word2Vec implementation and a simple graph walk. A REST API is implemented. Together with the Wikidata API the web service exposes a multilingual resource for over 600’000 Wikidata items and properties.

متن کامل

The Call for Recall

General-purpose knowledge bases (KBs) such as YAGO, Wikidata or the Google Knowledge Graph usually contain facts that have a high precision. In contrast, little is known about the recall of such KBs, and anecdotal evidence indicates that knowledge bases have a low recall on many topics. This project aims to develop techniques to evaluate and improve the recall of knowledge bases. We aim to use ...

متن کامل

From Freebase to Wikidata: The Great Migration

Collaborative knowledge bases that make their data freely available in a machine-readable form are central for the data strategy of many projects and organizations. The two major collaborative knowledge bases are Wikimedia’s Wikidata and Google’s Freebase. Due to the success of Wikidata, Google decided in 2014 to offer the content of Freebase to the Wikidata community. In this paper, we report ...

متن کامل

Classification of Knowledge Organization Systems with Wikidata

This paper presents a crowd-sourced classification of knowledge organization systems based on open knowledge base Wikidata. The focus is less on the current result in its rather preliminary form but on the environment and process of categorization in Wikidata and the extraction of KOS from the collaborative database. Benefits and disadvantages are summarized and discussed for application to kno...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017